This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.

Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Ctrl+Shift+Enter.

source("tianfengRwrappers.R")
Registered S3 methods overwritten by 'htmltools':
  method               from         
  print.html           tools:rstudio
  print.shiny.tag      tools:rstudio
  print.shiny.tag.list tools:rstudio
Registered S3 method overwritten by 'data.table':
  method           from
  print.data.table     
Registered S3 method overwritten by 'htmlwidgets':
  method           from         
  print.htmlwidget tools:rstudio
载入需要的程辑包:dplyr

载入程辑包:‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

载入需要的程辑包:reticulate
载入需要的程辑包:tidyr

载入程辑包:‘MySeuratWrappers’

The following objects are masked from ‘package:Seurat’:

    DimPlot, DoHeatmap, LabelClusters, RidgePlot, VlnPlot


载入程辑包:‘cowplot’

The following object is masked from ‘package:ggpubr’:

    get_legend

载入需要的程辑包:viridisLite

载入程辑包:‘reshape2’

The following object is masked from ‘package:tidyr’:

    smiths

NOTE: Either Arial Narrow or Roboto Condensed fonts are required to use these themes.
      Please use hrbrthemes::import_roboto_condensed() to install Roboto Condensed and
      if Arial Narrow is not on your system, please see https://bit.ly/arialnarrow

Registered S3 method overwritten by 'enrichplot':
  method               from
  fortify.enrichResult DOSE
clusterProfiler v3.14.3  For help: https://guangchuangyu.github.io/software/clusterProfiler

If you use clusterProfiler in published research, please cite:
Guangchuang Yu, Li-Gen Wang, Yanyan Han, Qing-Yu He. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS: A Journal of Integrative Biology. 2012, 16(5):284-287.
Registering fonts with R

载入程辑包:‘plotly’

The following object is masked from ‘package:ggplot2’:

    last_plot

The following object is masked from ‘package:stats’:

    filter

The following object is masked from ‘package:graphics’:

    layout

载入需要的程辑包:Biobase
载入需要的程辑包:BiocGenerics
载入需要的程辑包:parallel

载入程辑包:‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap,
    parApply, parCapply, parLapply, parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:dplyr’:

    combine, intersect, setdiff, union

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, basename, cbind, colnames, dirname, do.call,
    duplicated, eval, evalq, Filter, Find, get, grep, grepl, intersect, is.unsorted,
    lapply, Map, mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
    Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit, which, which.max, which.min

Welcome to Bioconductor

    Vignettes contain introductory material; view with 'browseVignettes()'. To cite
    Bioconductor, see 'citation("Biobase")', and for packages 'citation("pkgname")'.

载入需要的程辑包:e1071

载入程辑包:‘widgetTools’

The following object is masked from ‘package:dplyr’:

    funs


载入程辑包:‘DynDoc’

The following object is masked from ‘package:BiocGenerics’:

    path


载入程辑包:‘DT’

The following object is masked from ‘package:Seurat’:

    JS

========================================
circlize version 0.4.13
CRAN page: https://cran.r-project.org/package=circlize
Github page: https://github.com/jokergoo/circlize
Documentation: https://jokergoo.github.io/circlize_book/book/

If you use it in published research, please cite:
Gu, Z. circlize implements and enhances circular visualization
  in R. Bioinformatics 2014.

This message can be suppressed by:
  suppressPackageStartupMessages(library(circlize))
========================================

载入需要的程辑包:grid
========================================
ComplexHeatmap version 2.2.0
Bioconductor page: http://bioconductor.org/packages/ComplexHeatmap/
Github page: https://github.com/jokergoo/ComplexHeatmap
Documentation: http://jokergoo.github.io/ComplexHeatmap-reference

If you use it in published research, please cite:
Gu, Z. Complex heatmaps reveal patterns and correlations in multidimensional 
  genomic data. Bioinformatics 2016.
========================================


载入程辑包:‘ComplexHeatmap’

The following object is masked from ‘package:plotly’:

    add_heatmap
library(xgboost)

载入程辑包:‘xgboost’

The following object is masked from ‘package:plotly’:

    slice

The following object is masked from ‘package:dplyr’:

    slice
library(Matrix)

载入程辑包:‘Matrix’

The following objects are masked from ‘package:tidyr’:

    expand, pack, unpack
library(mclust)
    __  ___________    __  _____________
   /  |/  / ____/ /   / / / / ___/_  __/
  / /|_/ / /   / /   / / / /\__ \ / /   
 / /  / / /___/ /___/ /_/ /___/ // /    
/_/  /_/\____/_____/\____//____//_/    version 5.4.9
Type 'citation("mclust")' for citing this R package in publications.
library(tidyverse)
Registered S3 methods overwritten by 'dbplyr':
  method         from
  print.tbl_lazy     
  print.tbl_sql      
Registered S3 method overwritten by 'cli':
  method     from    
  print.boxx spatstat
─ Attaching packages ─────────────────────────────── tidyverse 1.3.1 ─
✓ tibble  3.1.5     ✓ stringr 1.4.0
✓ readr   2.0.2     ✓ forcats 0.5.1
✓ purrr   0.3.4     
─ Conflicts ───────────────────────────────── tidyverse_conflicts() ─
x Biobase::combine()       masks BiocGenerics::combine(), dplyr::combine()
x Matrix::expand()         masks tidyr::expand()
x plotly::filter()         masks dplyr::filter(), stats::filter()
x widgetTools::funs()      masks dplyr::funs()
x dplyr::lag()             masks stats::lag()
x purrr::map()             masks mclust::map()
x Matrix::pack()           masks tidyr::pack()
x BiocGenerics::Position() masks ggplot2::Position(), base::Position()
x purrr::simplify()        masks clusterProfiler::simplify()
x xgboost::slice()         masks plotly::slice(), dplyr::slice()
x Matrix::unpack()         masks tidyr::unpack()

数值化

ds2训练分类器


ds2_data <- get_data_table(ds2, highvar = F, type = "data")
ds2_label <- as.numeric(as.character(Idents(ds2)))

index <- c(1:dim(ds2_data)[2]) %>% sample(ceiling(0.3*dim(ds2_data)[2]), replace = F, prob = NULL)
colnames(ds2_data) <- NULL

ds2_train_data <- list(data = t(as(ds2_data[,-index],"dgCMatrix")), label = ds2_label[-index])
ds2_test_data <- list(data = t(as(ds2_data[,index],"dgCMatrix")), label = ds2_label[index])

ds2_train <- xgb.DMatrix(data = ds2_train_data$data,label = ds2_train_data$label)
ds2_test <- xgb.DMatrix(data = ds2_test_data$data,label = ds2_test_data$label)

watchlist <- list(train = ds2_train, eval = ds2_test)
xgb_param <- list(eta = 0.2, max_depth = 6, 
                  subsample = 0.6,  num_class = length(table(Idents(ds2))),
                  objective = "multi:softprob", eval_metric = 'mlogloss')

bst_model <- xgb.train(xgb_param, ds2_train, nrounds = 100, watchlist, verbose = 0)

eval_loss <- bst_model[["evaluation_log"]][["eval_mlogloss"]]
plot_ly(data.frame(eval_loss), x = c(1:100), y = eval_loss) %>% 
  add_trace(type = "scatter", mode = "markers+lines", 
            marker = list(color = "black", line = list(color = "#1E90FFC7", width = 1)),
            line = list(color = "#1E90FF80", width = 2)) %>% 
  layout(xaxis = list(title = "epoch"),yaxis = list(title = "eval_mlogloss"))

ds2 -> ds1

Idents(ds1) <- ds1$seurat_clusters
temp <- get_data_table(ds1, highvar = F, type = "data")
ds1_data <- matrix(data=0, nrow = length(rownames(ds2_data)), ncol = length(colnames(temp)), 
                   byrow = FALSE, dimnames = list(rownames(ds2_data),colnames(temp)))
for(i in intersect(rownames(ds2_data), rownames(temp))){
  ds1_data[i,] <- temp[i,]
}
rm(temp)
ds1_label <- as.numeric(as.character(Idents(ds1)))
colnames(ds1_data) <- NULL
ds1_test_data <- list(data = t(as(ds1_data,"dgCMatrix")), label = ds1_label)
ds1_test <- xgb.DMatrix(data = ds1_test_data$data,label = ds1_test_data$label)

#预测结果

predict_ds1_test <- predict(bst_model, newdata = ds1_test)

predict_prop_ds1 <- matrix(data=predict_ds1_test, nrow = length(levels(Idents(ds2))), 
                           ncol = ncol(ds1), byrow = FALSE, 
                           dimnames = list(levels(Idents(ds2)),colnames(ds1)))

## 得到分群结果
ds1_res <- apply(predict_prop_ds1,2,func,rownames(predict_prop_ds1))
Idents(ds1) <- factor(ds1_res,levels = c(0:4))
umapplot(ds1)

ds1$supclustering <- Idents(ds1) #保存监督聚类结果

数值化地投射回umap

embedding <- FetchData(object = ds1, vars = c("UMAP_1", "UMAP_2"))
embedding <- cbind(embedding, t(predict_prop_ds1))

ggobj <- ggplot() +
  geom_point(data = embedding[embedding$`0`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `0`), shape=16, size = 3, alpha=0.5) + 
  scale_color_gradient('0', low = "#FFFFFF00", high = "#6dc0a6") +
  new_scale("color") +
    geom_point(data = embedding[embedding$`1`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `1`), size = 3, alpha=0.5) + 
  scale_color_gradient('1', low = "#FFFFFF00", high = "#e2b398") +
   new_scale("color") +
    geom_point(data = embedding[embedding$`2`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `2`), size = 2, alpha=0.5) + 
  scale_color_gradient('2', low = "#FFFFFF00", high = "#e2a2ca") +
  new_scale("color") +
    geom_point(data = embedding[embedding$`3`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `3`), size = 3, alpha=0.5) + 
  scale_color_gradient('3', low = "#FFFFFF00", high = "#d1eba8") +
   new_scale("color") +
      geom_point(data = embedding[embedding$`4`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `4`), size = 3, alpha=0.5) + 
  scale_color_gradient('4', low = "#FFFFFF00", high = "#b1d6fb") +
    new_scale("color") +
        xlab("UMAP 1") + ylab("UMAP 2")  +
        theme(axis.line = element_line(arrow = arrow(length = unit(0.2, "cm")))) +
        scale_y_continuous(breaks = NULL) +
        scale_x_continuous(breaks = NULL) + 
  theme(panel.background = element_blank(), panel.grid = element_blank(), legend.position = "bottom")
ggsave("pre_ds1_umap.svg",device = svg,plot = ggobj,height = 10,width = 10)

#ds2 -> ds0

embedding <- FetchData(object = ds0, vars = c("UMAP_1", "UMAP_2"))
embedding <- cbind(embedding, t(predict_prop_ds0))

ggobj <- ggplot() +
  geom_point(data = embedding[embedding$`0`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `0`), shape=16, size = 3, alpha=0.5) + 
  scale_color_gradient('0', low = "#FFFFFF00", high = "#6dc0a6") +
  new_scale("color") +
    geom_point(data = embedding[embedding$`1`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `1`),shape=16, size = 3, alpha=0.5) + 
  scale_color_gradient('1', low = "#FFFFFF00", high = "#e2b398") +
   new_scale("color") +
    geom_point(data = embedding[embedding$`2`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `2`),shape=16, size = 2, alpha=0.5) + 
  scale_color_gradient('2', low = "#FFFFFF00", high = "#e2a2ca") +
  new_scale("color") +
    geom_point(data = embedding[embedding$`3`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `3`),shape=16, size = 3, alpha=0.5) + 
  scale_color_gradient('3', low = "#FFFFFF00", high = "#d1eba8") +
   new_scale("color") +
      geom_point(data = embedding[embedding$`4`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `4`),shape=16, size = 3, alpha=0.5) + 
  scale_color_gradient('4', low = "#FFFFFF00", high = "#b1d6fb") +
    new_scale("color") +
        xlab("UMAP 1") + ylab("UMAP 2")  +
        theme(axis.line = element_line(arrow = arrow(length = unit(0.2, "cm")))) +
        scale_y_continuous(breaks = NULL) +
        scale_x_continuous(breaks = NULL) + 
  theme(panel.background = element_blank(), panel.grid = element_blank(), legend.position = "bottom")
ggsave("pre_ds1_umap.svg",device = svg,plot = ggobj,height = 10,width = 10)

PA -> AC

Idents(ds2_PA) <- ds2_PA$seurat_clusters
selected_features <- read.csv("./datatable/selected_features.csv", stringsAsFactors = F)
selected_features <- selected_features$x
PA_data <- get_data_table(ds2_PA, highvar = F, type = "data")
PA_data <- PA_data[selected_features,]
PA_label <- as.numeric(as.character(Idents(ds2_PA)))
colnames(PA_data) <- NULL

PA_train_data <- list(data = t(as(PA_data,"dgCMatrix")), label = PA_label)
PA_train <- xgb.DMatrix(data = PA_train_data$data,label = PA_train_data$label)
xgb_param <- list(eta = 0.2, max_depth = 6, 
                  subsample = 0.6,  num_class = length(table(Idents(ds2_PA))),
                  objective = "multi:softprob", eval_metric = 'mlogloss')

bst_model <- xgb.train(xgb_param, PA_train, nrounds = 100, verbose = 0)
embedding <- FetchData(object = ds2_AC, vars = c("UMAP_1", "UMAP_2"))
embedding <- cbind(embedding, t(predict_prop_AC))

ggobj <- ggplot() +
  geom_point(data = embedding[embedding$`0`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `0`), shape=16, size = 2, alpha=0.5) + 
  scale_color_gradient('0', low = "#FFFFFF00", high = "#6dc0a6") +
  new_scale("color") +
    geom_point(data = embedding[embedding$`1`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `1`),shape=16, size = 2, alpha=0.5) + 
  scale_color_gradient('1', low = "#FFFFFF00", high = "#e2b398") +
   new_scale("color") +
    geom_point(data = embedding[embedding$`2`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `2`),shape=16, size = 2, alpha=0.5) + 
  scale_color_gradient('2', low = "#FFFFFF00", high = "#e2a2ca") +
        xlab("UMAP 1") + ylab("UMAP 2")  +
        theme(axis.line = element_line(arrow = arrow(length = unit(0.2, "cm")))) +
        scale_y_continuous(breaks = NULL) +
        scale_x_continuous(breaks = NULL) + 
  theme(panel.background = element_blank(), panel.grid = element_blank(), legend.position = "bottom")
ggsave("ds2_PAtoAC_umap.svg",device = svg,plot = ggobj,height = 8,width = 8)

AC to PA

Idents(ds2_AC) <- ds2_AC$seurat_clusters
selected_features <- read.csv("./datatable/selected_features.csv", stringsAsFactors = F)
selected_features <- selected_features$x
AC_data <- get_data_table(ds2_AC, highvar = F, type = "data")
AC_data <- AC_data[selected_features,]
AC_label <- as.numeric(as.character(Idents(ds2_AC)))
colnames(AC_data) <- NULL

AC_train_data <- list(data = t(as(AC_data,"dgCMatrix")), label = AC_label)
AC_train <- xgb.DMatrix(data = AC_train_data$data,label = AC_train_data$label)
xgb_ACram <- list(eta = 0.2, max_depth = 6, 
                  subsample = 0.6,  num_class = length(table(Idents(ds2_AC))),
                  objective = "multi:softprob", eval_metric = 'mlogloss')

bst_model <- xgb.train(xgb_ACram, AC_train, nrounds = 100, verbose = 0)
Idents(ds2_PA) <- factor(ds2_PA$seurat_clusters,levels = c(0,1,2))

PA_data <- get_data_table(ds2_PA, highvar = F, type = "data")
PA_data <- PA_data[selected_features,]
PA_label <- as.numeric(as.character(Idents(ds2_PA)))
colnames(PA_data) <- NULL
PA_test_data <- list(data = t(as(PA_data,"dgCMatrix")), label = PA_label)
PA_test <- xgb.DMatrix(data = PA_test_data$data,label = PA_test_data$label)

#预测结果
predict_prop_PA <-predict(bst_model, newdata = PA_test) %>%
 matrix(nrow = length(levels(Idents(ds2_AC))), 
                           ncol = ncol(ds2_PA), byrow = FALSE, 
                           dimnames = list(levels(Idents(ds2_AC)),colnames(ds2_PA)))
PA_res <- apply(predict_prop_PA,2,func,rownames(predict_prop_PA))

confuse_matrix1 <- table(PA_test_data$label, PA_res, dnn=c("true","pre"))
sankey_plot(confuse_matrix1,session = "ACtoPA")

Idents(ds2_PA) <- factor(PA_res)
umapplot(ds2_PA)

embedding <- FetchData(object = ds2_PA, vars = c("UMAP_1", "UMAP_2"))
embedding <- cbind(embedding, t(predict_prop_PA))

ggobj <- ggplot() +
  geom_point(data = embedding[embedding$`0`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `0`), shape=16, size = 2, alpha=0.5) + 
  scale_color_gradient('0', low = "#FFFFFF00", high = "#6dc0a6") +
  new_scale("color") +
    geom_point(data = embedding[embedding$`1`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `1`),shape=16, size = 2, alpha=0.5) + 
  scale_color_gradient('1', low = "#FFFFFF00", high = "#e2b398") +
   new_scale("color") +
    geom_point(data = embedding[embedding$`2`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `2`),shape=16, size = 2, alpha=0.5) + 
  scale_color_gradient('2', low = "#FFFFFF00", high = "#e2a2ca") +
     new_scale("color") +
    geom_point(data = embedding[embedding$`3`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `3`),shape=16, size = 2, alpha=0.5) + 
  scale_color_gradient('3', low = "#FFFFFF00", high = "#d1eba8") +
        xlab("UMAP 1") + ylab("UMAP 2")  +
        theme(axis.line = element_line(arrow = arrow(length = unit(0.2, "cm")))) +
        scale_y_continuous(breaks = NULL) +
        scale_x_continuous(breaks = NULL) + 
  theme(panel.background = element_blank(), panel.grid = element_blank(), legend.position = "bottom")
ggsave("ds2_ACtoPA_umap.svg",device = svg,plot = ggobj,height = 8,width = 8)

在ds0上训练

Idents(ds0) <- ds0$seurat_clusters
ds0_data <- get_data_table(ds0, highvar = F, type = "data")
ds0_label <- as.numeric(as.character(Idents(ds0)))

index <- c(1:dim(ds0_data)[2]) %>% sample(ceiling(0.3*dim(ds0_data)[2]), replace = F, prob = NULL)
colnames(ds0_data) <- NULL

ds0_train_data <- list(data = t(as(ds0_data[,-index],"dgCMatrix")), label = ds0_label[-index])
ds0_test_data <- list(data = t(as(ds0_data[,index],"dgCMatrix")), label = ds0_label[index])

ds0_train <- xgb.DMatrix(data = ds0_train_data$data,label = ds0_train_data$label)
ds0_test <- xgb.DMatrix(data = ds0_test_data$data,label = ds0_test_data$label)

watchlist <- list(train = ds0_train, eval = ds0_test)
xgb_param <- list(eta = 0.2, max_depth = 6, 
                  subsample = 0.6,  num_class = length(table(Idents(ds0))),
                  objective = "multi:softprob", eval_metric = 'mlogloss')

bst_model <- xgb.train(xgb_param, ds0_train, nrounds = 100, watchlist, verbose = 0)

eval_loss <- bst_model[["evaluation_log"]][["eval_mlogloss"]]
plot_ly(data.frame(eval_loss), x = c(1:100), y = eval_loss) %>% 
  add_trace(type = "scatter", mode = "markers+lines", 
            marker = list(color = "black", line = list(color = "#1E90FFC7", width = 1)),
            line = list(color = "#1E90FF80", width = 2)) %>% 
  layout(xaxis = list(title = "epoch"),yaxis = list(title = "eval_mlogloss"))
importance <- xgb.importance(colnames(ds0_train), model = bst_model)
head(importance)
xgb.ggplot.importance(head(importance,20),n_clusters = 1) + theme_bw()+theme(
    axis.title.x = element_text(size = 15), axis.text.x = element_text(size = 8, colour = "black"),
    axis.title.y = element_text(size = 15), axis.text.y = element_text(size = 12, colour = "black"),
    legend.text = element_text(size = 20), legend.title = element_blank(), panel.grid = element_blank())

write.csv(importance, "./datatable/ds0_features.csv", row.names = F)
multi_featureplot(head(importance,9)$Feature, ds0, labels = "") 
Warning: Using `as.character()` on a quosure is deprecated as of rlang 0.3.0.
Please use `as_label()` or `as_name()` instead.
This warning is displayed once per session.

ds0 -> ds2

Idents(ds2) <- ds2$seurat_clusters 
temp <- get_data_table(ds2, highvar = F, type = "data")
ds2_data <- matrix(data=0, nrow = length(rownames(ds0_data)), ncol = length(colnames(temp)), 
                   byrow = FALSE, dimnames = list(rownames(ds0_data),colnames(temp)))
for(i in intersect(rownames(ds2_data), rownames(temp))){
  ds2_data[i,] <- temp[i,]
}
rm(temp)
ds2_label <- as.numeric(as.character(Idents(ds2)))
colnames(ds2_data) <- NULL
ds2_test_data <- list(data = t(as(ds2_data,"dgCMatrix")), label = ds2_label)
ds2_test <- xgb.DMatrix(data = ds2_test_data$data,label = ds2_test_data$label)

#预测结果

predict_ds2_test <- predict(bst_model, newdata = ds2_test)

predict_prop_ds2 <- matrix(data=predict_ds2_test, nrow = bst_model[["params"]][["num_class"]], 
                           ncol = ncol(ds2), byrow = FALSE, 
                           dimnames = list(c(0:(bst_model[["params"]][["num_class"]]-1)),colnames(ds2)))

## 得到分群结果
ds2_res <- apply(predict_prop_ds2,2,func,rownames(predict_prop_ds2))
confuse_matrix1 <- table(ds2_test_data$label, ds2_res, dnn=c("true","pre"))

sankey_plot(confuse_matrix1,0:5,0:4,session = "ds0tods2")

Idents(ds2) <- factor(ds2_res,levels = c(0:5))
umapplot(ds2)

embedding <- FetchData(object = ds2, vars = c("UMAP_1", "UMAP_2"))
embedding <- cbind(embedding, t(predict_prop_ds2))

ggobj <- ggplot() +
  geom_point(data = embedding[embedding$`0`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `0`), shape=16, size = 2, alpha=0.5) + 
  scale_color_gradient('0', low = "#FFFFFF00", high = "#6dc0a6") +
  new_scale("color") +
    geom_point(data = embedding[embedding$`1`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `1`),shape=16, size = 2, alpha=0.5) + 
  scale_color_gradient('1', low = "#FFFFFF00", high = "#e2b398") +
   new_scale("color") +
    geom_point(data = embedding[embedding$`2`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `2`),shape=16, size = 2, alpha=0.5) + 
  scale_color_gradient('2', low = "#FFFFFF00", high = "#e2a2ca") +
     new_scale("color") +
    geom_point(data = embedding[embedding$`3`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `3`),shape=16, size = 2, alpha=0.5) + 
  scale_color_gradient('3', low = "#FFFFFF00", high = "#d1eba8") +
     new_scale("color") +
    geom_point(data = embedding[embedding$`4`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `4`),shape=16, size = 2, alpha=0.5) + 
  scale_color_gradient('4', low = "#FFFFFF00", high = "#b1d6fb") +
     new_scale("color") +
    geom_point(data = embedding[embedding$`5`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `5`),shape=16, size = 2, alpha=0.5) + 
  scale_color_gradient('5', low = "#FFFFFF00", high = "#fd9999") +
        xlab("UMAP 1") + ylab("UMAP 2")  +
        theme(axis.line = element_line(arrow = arrow(length = unit(0.2, "cm")))) +
        scale_y_continuous(breaks = NULL) +
        scale_x_continuous(breaks = NULL) + 
  theme(panel.background = element_blank(), panel.grid = element_blank(), legend.position = "bottom")
ggsave("ds0tods2umap.svg",device = svg,plot = ggobj,height = 8,width = 8)

ds0 -> ds1

embedding <- FetchData(object = ds1, vars = c("UMAP_1", "UMAP_2"))
embedding <- cbind(embedding, t(predict_prop_ds1))

ggobj <- ggplot() +
  geom_point(data = embedding[embedding$`0`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `0`), shape=16, size = 2, alpha=0.5) + 
  scale_color_gradient('0', low = "#FFFFFF00", high = "#6dc0a6") +
  new_scale("color") +
    geom_point(data = embedding[embedding$`1`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `1`),shape=16, size = 2, alpha=0.5) + 
  scale_color_gradient('1', low = "#FFFFFF00", high = "#e2b398") +
   new_scale("color") +
    geom_point(data = embedding[embedding$`2`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `2`),shape=16, size = 2, alpha=0.5) + 
  scale_color_gradient('2', low = "#FFFFFF00", high = "#e2a2ca") +
     new_scale("color") +
    geom_point(data = embedding[embedding$`3`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `3`),shape=16, size = 2, alpha=0.5) + 
  scale_color_gradient('3', low = "#FFFFFF00", high = "#d1eba8") +
     new_scale("color") +
    geom_point(data = embedding[embedding$`4`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `4`),shape=16, size = 2, alpha=0.5) + 
  scale_color_gradient('4', low = "#FFFFFF00", high = "#b1d6fb") +
     new_scale("color") +
    geom_point(data = embedding[embedding$`5`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `5`),shape=16, size = 2, alpha=0.5) + 
  scale_color_gradient('5', low = "#FFFFFF00", high = "#fd9999") +
        xlab("UMAP 1") + ylab("UMAP 2")  +
        theme(axis.line = element_line(arrow = arrow(length = unit(0.2, "cm")))) +
        scale_y_continuous(breaks = NULL) +
        scale_x_continuous(breaks = NULL) + 
  theme(panel.background = element_blank(), panel.grid = element_blank(), legend.position = "bottom")
ggsave("ds0tods1umap.svg",device = svg,plot = ggobj,height = 8,width = 8)

##lym

ARI 和聚类数的关系

Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Ctrl+Alt+I.

When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Ctrl+Shift+K to preview the HTML file).

The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike Knit, Preview does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.

---
title: "R Notebook"
output: html_notebook
---

This is an [R Markdown](http://rmarkdown.rstudio.com) Notebook. When you execute code within the notebook, the results appear beneath the code. 

Try executing this chunk by clicking the *Run* button within the chunk or by placing your cursor inside it and pressing *Ctrl+Shift+Enter*. 

```{r}
source("tianfengRwrappers.R")
library(xgboost)
library(Matrix)
library(mclust)
library(tidyverse)
```


## 数值化
### ds2训练分类器
```{r}
ds2_data <- get_data_table(ds2, highvar = F, type = "data")
ds2_label <- as.numeric(as.character(Idents(ds2)))

index <- c(1:dim(ds2_data)[2]) %>% sample(ceiling(0.3*dim(ds2_data)[2]), replace = F, prob = NULL)
colnames(ds2_data) <- NULL

ds2_train_data <- list(data = t(as(ds2_data[,-index],"dgCMatrix")), label = ds2_label[-index])
ds2_test_data <- list(data = t(as(ds2_data[,index],"dgCMatrix")), label = ds2_label[index])

ds2_train <- xgb.DMatrix(data = ds2_train_data$data,label = ds2_train_data$label)
ds2_test <- xgb.DMatrix(data = ds2_test_data$data,label = ds2_test_data$label)

watchlist <- list(train = ds2_train, eval = ds2_test)
xgb_param <- list(eta = 0.2, max_depth = 6, 
                  subsample = 0.6,  num_class = length(table(Idents(ds2))),
                  objective = "multi:softprob", eval_metric = 'mlogloss')

bst_model <- xgb.train(xgb_param, ds2_train, nrounds = 100, watchlist, verbose = 0)
saveRDS(bst_model, "ds2_model.rds")
eval_loss <- bst_model[["evaluation_log"]][["eval_mlogloss"]]
plot_ly(data.frame(eval_loss), x = c(1:100), y = eval_loss) %>% 
  add_trace(type = "scatter", mode = "markers+lines", 
            marker = list(color = "black", line = list(color = "#1E90FFC7", width = 1)),
            line = list(color = "#1E90FF80", width = 2)) %>% 
  layout(xaxis = list(title = "epoch"),yaxis = list(title = "eval_mlogloss"))
```

```{r fig.height=6,fig.width=6}
importance <- xgb.importance(colnames(ds2_train), model = bst_model)
head(importance)
xgb.ggplot.importance(head(importance,20),n_clusters = 1) + theme_bw()+theme(
    axis.title.x = element_text(size = 15), axis.text.x = element_text(size = 8, colour = "black"),
    axis.title.y = element_text(size = 15), axis.text.y = element_text(size = 12, colour = "black"),
    legend.text = element_text(size = 20), legend.title = element_blank(), panel.grid = element_blank())
```


## ds2 -> ds1
```{r}
Idents(ds1) <- ds1$seurat_clusters
temp <- get_data_table(ds1, highvar = F, type = "data")
ds1_data <- matrix(data=0, nrow = length(rownames(ds2_data)), ncol = length(colnames(temp)), 
                   byrow = FALSE, dimnames = list(rownames(ds2_data),colnames(temp)))
for(i in intersect(rownames(ds2_data), rownames(temp))){
  ds1_data[i,] <- temp[i,]
}
rm(temp)
ds1_label <- as.numeric(as.character(Idents(ds1)))
colnames(ds1_data) <- NULL
ds1_test_data <- list(data = t(as(ds1_data,"dgCMatrix")), label = ds1_label)
ds1_test <- xgb.DMatrix(data = ds1_test_data$data,label = ds1_test_data$label)

#预测结果

predict_ds1_test <- predict(bst_model, newdata = ds1_test)

predict_prop_ds1 <- matrix(data=predict_ds1_test, nrow = length(levels(Idents(ds2))), 
                           ncol = ncol(ds1), byrow = FALSE, 
                           dimnames = list(levels(Idents(ds2)),colnames(ds1)))

## 得到分群结果
ds1_res <- apply(predict_prop_ds1,2,func,rownames(predict_prop_ds1))
Idents(ds1) <- factor(ds1_res,levels = c(0:4))
umapplot(ds1)
ds1$supclustering <- Idents(ds1) #保存监督聚类结果
```

## 数值化地投射回umap
```{r}
embedding <- FetchData(object = ds1, vars = c("UMAP_1", "UMAP_2"))
embedding <- cbind(embedding, t(predict_prop_ds1))

ggobj <- ggplot() +
  geom_point(data = embedding[embedding$`0`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `0`), shape=16, size = 3, alpha=0.5) + 
  scale_color_gradient('0', low = "#FFFFFF00", high = "#6dc0a6") +
  new_scale("color") +
    geom_point(data = embedding[embedding$`1`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `1`),shape=16, size = 3, alpha=0.5) + 
  scale_color_gradient('1', low = "#FFFFFF00", high = "#e2b398") +
   new_scale("color") +
    geom_point(data = embedding[embedding$`2`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `2`),shape=16, size = 3, alpha=0.5) + 
  scale_color_gradient('2', low = "#FFFFFF00", high = "#e2a2ca") +
  new_scale("color") +
    geom_point(data = embedding[embedding$`3`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `3`),shape=16, size = 3, alpha=0.5) + 
  scale_color_gradient('3', low = "#FFFFFF00", high = "#d1eba8") +
   new_scale("color") +
      geom_point(data = embedding[embedding$`4`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `4`),shape=16, size = 3, alpha=0.5) + 
  scale_color_gradient('4', low = "#FFFFFF00", high = "#b1d6fb") +
    new_scale("color") +
        xlab("UMAP 1") + ylab("UMAP 2")  +
        theme(axis.line = element_line(arrow = arrow(length = unit(0.2, "cm")))) +
        scale_y_continuous(breaks = NULL) +
        scale_x_continuous(breaks = NULL) + 
  theme(panel.background = element_blank(), panel.grid = element_blank(), legend.position = "bottom")
ggsave("pre_ds1_umap.svg",device = svg,plot = ggobj,height = 10,width = 10)
```

#ds2 -> ds0
```{r}
Idents(ds0) <- ds0$seurat_clusters
temp <- get_data_table(ds0, highvar = F, type = "data")
ds0_data <- matrix(data=0, nrow = length(rownames(ds2_data)), ncol = length(colnames(temp)), 
                   byrow = FALSE, dimnames = list(rownames(ds2_data),colnames(temp)))
for(i in intersect(rownames(ds2_data), rownames(temp))){
  ds0_data[i,] <- temp[i,]
}
rm(temp)
ds0_label <- as.numeric(as.character(Idents(ds0)))
colnames(ds0_data) <- NULL
ds0_test_data <- list(data = t(as(ds0_data,"dgCMatrix")), label = ds0_label)
ds0_test <- xgb.DMatrix(data = ds0_test_data$data,label = ds0_test_data$label)

#预测结果

predict_ds0_test <- predict(bst_model, newdata = ds0_test)

predict_prop_ds0 <- matrix(data=predict_ds0_test, nrow = length(levels(Idents(ds2))), 
                           ncol = ncol(ds0), byrow = FALSE, 
                           dimnames = list(levels(Idents(ds2)),colnames(ds0)))

## 得到分群结果
ds0_res <- apply(predict_prop_ds0,2,func,rownames(predict_prop_ds0))
Idents(ds0) <- factor(ds0_res,levels = c(0:4))
umapplot(ds0)
ds0$supclustering <- Idents(ds0) #保存监督聚类结果
```

```{r}
embedding <- FetchData(object = ds0, vars = c("UMAP_1", "UMAP_2"))
embedding <- cbind(embedding, t(predict_prop_ds0))

ggobj <- ggplot() +
  geom_point(data = embedding[embedding$`0`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `0`), shape=16, size = 3, alpha=0.5) + 
  scale_color_gradient('0', low = "#FFFFFF00", high = "#6dc0a6") +
  new_scale("color") +
    geom_point(data = embedding[embedding$`1`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `1`),shape=16, size = 3, alpha=0.5) + 
  scale_color_gradient('1', low = "#FFFFFF00", high = "#e2b398") +
   new_scale("color") +
    geom_point(data = embedding[embedding$`2`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `2`),shape=16, size = 3, alpha=0.5) + 
  scale_color_gradient('2', low = "#FFFFFF00", high = "#e2a2ca") +
  new_scale("color") +
    geom_point(data = embedding[embedding$`3`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `3`),shape=16, size = 3, alpha=0.5) + 
  scale_color_gradient('3', low = "#FFFFFF00", high = "#d1eba8") +
   new_scale("color") +
      geom_point(data = embedding[embedding$`4`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `4`),shape=16, size = 3, alpha=0.5) + 
  scale_color_gradient('4', low = "#FFFFFF00", high = "#b1d6fb") +
    new_scale("color") +
        xlab("UMAP 1") + ylab("UMAP 2")  +
        theme(axis.line = element_line(arrow = arrow(length = unit(0.2, "cm")))) +
        scale_y_continuous(breaks = NULL) +
        scale_x_continuous(breaks = NULL) + 
  theme(panel.background = element_blank(), panel.grid = element_blank(), legend.position = "bottom")
ggsave("pre_ds0_umap.svg",device = svg,plot = ggobj,height = 10,width = 10)
```


# PA -> AC
```{r}
Idents(ds2_PA) <- ds2_PA$seurat_clusters
selected_features <- read.csv("./datatable/selected_features.csv", stringsAsFactors = F)
selected_features <- selected_features$x
PA_data <- get_data_table(ds2_PA, highvar = F, type = "data")
PA_data <- PA_data[selected_features,]
PA_label <- as.numeric(as.character(Idents(ds2_PA)))
colnames(PA_data) <- NULL

PA_train_data <- list(data = t(as(PA_data,"dgCMatrix")), label = PA_label)
PA_train <- xgb.DMatrix(data = PA_train_data$data,label = PA_train_data$label)
xgb_param <- list(eta = 0.2, max_depth = 6, 
                  subsample = 0.6,  num_class = length(table(Idents(ds2_PA))),
                  objective = "multi:softprob", eval_metric = 'mlogloss')

bst_model <- xgb.train(xgb_param, PA_train, nrounds = 100, verbose = 0)
```

```{r}
Idents(ds2_AC) <- ds2_AC$seurat_clusters
AC_data <- get_data_table(ds2_AC, highvar = F, type = "data")
AC_data <- AC_data[selected_features,]
AC_label <- as.numeric(as.character(Idents(ds2_AC)))
colnames(AC_data) <- NULL
AC_test_data <- list(data = t(as(AC_data,"dgCMatrix")), label = AC_label)
AC_test <- xgb.DMatrix(data = AC_test_data$data,label = AC_test_data$label)

#预测结果
predict_prop_AC <-predict(bst_model, newdata = AC_test) %>%
 matrix(nrow = length(levels(Idents(ds2_PA))), 
                           ncol = ncol(ds2_AC), byrow = FALSE, 
                           dimnames = list(levels(Idents(ds2_PA)),colnames(ds2_AC)))
AC_res <- apply(predict_prop_AC,2,func,rownames(predict_prop_AC))

confuse_matrix1 <- table(AC_test_data$label, AC_res, dnn=c("true","pre"))
sankey_plot(confuse_matrix1,session = "PAtoAC")

Idents(ds2_AC) <- factor(AC_res,levels = c(0:2))
umapplot(ds2_AC)
```

```{r}
embedding <- FetchData(object = ds2_AC, vars = c("UMAP_1", "UMAP_2"))
embedding <- cbind(embedding, t(predict_prop_AC))

ggobj <- ggplot() +
  geom_point(data = embedding[embedding$`0`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `0`), shape=16, size = 2, alpha=0.5) + 
  scale_color_gradient('0', low = "#FFFFFF00", high = "#6dc0a6") +
  new_scale("color") +
    geom_point(data = embedding[embedding$`1`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `1`),shape=16, size = 2, alpha=0.5) + 
  scale_color_gradient('1', low = "#FFFFFF00", high = "#e2b398") +
   new_scale("color") +
    geom_point(data = embedding[embedding$`2`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `2`),shape=16, size = 2, alpha=0.5) + 
  scale_color_gradient('2', low = "#FFFFFF00", high = "#e2a2ca") +
        xlab("UMAP 1") + ylab("UMAP 2")  +
        theme(axis.line = element_line(arrow = arrow(length = unit(0.2, "cm")))) +
        scale_y_continuous(breaks = NULL) +
        scale_x_continuous(breaks = NULL) + 
  theme(panel.background = element_blank(), panel.grid = element_blank(), legend.position = "bottom")
ggsave("ds2_PAtoAC_umap.svg",device = svg,plot = ggobj,height = 8,width = 8)
```



## AC to PA
```{r}
Idents(ds2_AC) <- ds2_AC$seurat_clusters
selected_features <- read.csv("./datatable/selected_features.csv", stringsAsFactors = F)
selected_features <- selected_features$x
AC_data <- get_data_table(ds2_AC, highvar = F, type = "data")
AC_data <- AC_data[selected_features,]
AC_label <- as.numeric(as.character(Idents(ds2_AC)))
colnames(AC_data) <- NULL

AC_train_data <- list(data = t(as(AC_data,"dgCMatrix")), label = AC_label)
AC_train <- xgb.DMatrix(data = AC_train_data$data,label = AC_train_data$label)
xgb_ACram <- list(eta = 0.2, max_depth = 6, 
                  subsample = 0.6,  num_class = length(table(Idents(ds2_AC))),
                  objective = "multi:softprob", eval_metric = 'mlogloss')

bst_model <- xgb.train(xgb_ACram, AC_train, nrounds = 100, verbose = 0)
```

```{r}
Idents(ds2_PA) <- factor(ds2_PA$seurat_clusters,levels = c(0,1,2))

PA_data <- get_data_table(ds2_PA, highvar = F, type = "data")
PA_data <- PA_data[selected_features,]
PA_label <- as.numeric(as.character(Idents(ds2_PA)))
colnames(PA_data) <- NULL
PA_test_data <- list(data = t(as(PA_data,"dgCMatrix")), label = PA_label)
PA_test <- xgb.DMatrix(data = PA_test_data$data,label = PA_test_data$label)

#预测结果
predict_prop_PA <-predict(bst_model, newdata = PA_test) %>%
 matrix(nrow = length(levels(Idents(ds2_AC))), 
                           ncol = ncol(ds2_PA), byrow = FALSE, 
                           dimnames = list(levels(Idents(ds2_AC)),colnames(ds2_PA)))
PA_res <- apply(predict_prop_PA,2,func,rownames(predict_prop_PA))

confuse_matrix1 <- table(PA_test_data$label, PA_res, dnn=c("true","pre"))
sankey_plot(confuse_matrix1,session = "ACtoPA")

Idents(ds2_PA) <- factor(PA_res)
umapplot(ds2_PA)
```

```{r}
embedding <- FetchData(object = ds2_PA, vars = c("UMAP_1", "UMAP_2"))
embedding <- cbind(embedding, t(predict_prop_PA))

ggobj <- ggplot() +
  geom_point(data = embedding[embedding$`0`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `0`), shape=16, size = 2, alpha=0.5) + 
  scale_color_gradient('0', low = "#FFFFFF00", high = "#6dc0a6") +
  new_scale("color") +
    geom_point(data = embedding[embedding$`1`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `1`),shape=16, size = 2, alpha=0.5) + 
  scale_color_gradient('1', low = "#FFFFFF00", high = "#e2b398") +
   new_scale("color") +
    geom_point(data = embedding[embedding$`2`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `2`),shape=16, size = 2, alpha=0.5) + 
  scale_color_gradient('2', low = "#FFFFFF00", high = "#e2a2ca") +
     new_scale("color") +
    geom_point(data = embedding[embedding$`3`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `3`),shape=16, size = 2, alpha=0.5) + 
  scale_color_gradient('3', low = "#FFFFFF00", high = "#d1eba8") +
        xlab("UMAP 1") + ylab("UMAP 2")  +
        theme(axis.line = element_line(arrow = arrow(length = unit(0.2, "cm")))) +
        scale_y_continuous(breaks = NULL) +
        scale_x_continuous(breaks = NULL) + 
  theme(panel.background = element_blank(), panel.grid = element_blank(), legend.position = "bottom")
ggsave("ds2_ACtoPA_umap.svg",device = svg,plot = ggobj,height = 8,width = 8)
```


## 在ds0上训练
```{r}
Idents(ds0) <- ds0$seurat_clusters
ds0_data <- get_data_table(ds0, highvar = F, type = "data")
ds0_label <- as.numeric(as.character(Idents(ds0)))

index <- c(1:dim(ds0_data)[2]) %>% sample(ceiling(0.3*dim(ds0_data)[2]), replace = F, prob = NULL)
colnames(ds0_data) <- NULL

ds0_train_data <- list(data = t(as(ds0_data[,-index],"dgCMatrix")), label = ds0_label[-index])
ds0_test_data <- list(data = t(as(ds0_data[,index],"dgCMatrix")), label = ds0_label[index])

ds0_train <- xgb.DMatrix(data = ds0_train_data$data,label = ds0_train_data$label)
ds0_test <- xgb.DMatrix(data = ds0_test_data$data,label = ds0_test_data$label)

watchlist <- list(train = ds0_train, eval = ds0_test)
xgb_param <- list(eta = 0.2, max_depth = 6, 
                  subsample = 0.6,  num_class = length(table(Idents(ds0))),
                  objective = "multi:softprob", eval_metric = 'mlogloss')

bst_model <- xgb.train(xgb_param, ds0_train, nrounds = 100, watchlist, verbose = 0)

eval_loss <- bst_model[["evaluation_log"]][["eval_mlogloss"]]
plot_ly(data.frame(eval_loss), x = c(1:100), y = eval_loss) %>% 
  add_trace(type = "scatter", mode = "markers+lines", 
            marker = list(color = "black", line = list(color = "#1E90FFC7", width = 1)),
            line = list(color = "#1E90FF80", width = 2)) %>% 
  layout(xaxis = list(title = "epoch"),yaxis = list(title = "eval_mlogloss"))
```

```{r fig.width=6,fig.height=6}
importance <- xgb.importance(colnames(ds0_train), model = bst_model)
head(importance)
xgb.ggplot.importance(head(importance,20),n_clusters = 1) + theme_bw()+theme(
    axis.title.x = element_text(size = 15), axis.text.x = element_text(size = 8, colour = "black"),
    axis.title.y = element_text(size = 15), axis.text.y = element_text(size = 12, colour = "black"),
    legend.text = element_text(size = 20), legend.title = element_blank(), panel.grid = element_blank())
write.csv(importance, "./datatable/ds0_features.csv", row.names = F)
multi_featureplot(head(importance,9)$Feature, ds0, labels = "") 
```
## ds0 -> ds2
```{r}
Idents(ds2) <- ds2$seurat_clusters 
temp <- get_data_table(ds2, highvar = F, type = "data")
ds2_data <- matrix(data=0, nrow = length(rownames(ds0_data)), ncol = length(colnames(temp)), 
                   byrow = FALSE, dimnames = list(rownames(ds0_data),colnames(temp)))
for(i in intersect(rownames(ds2_data), rownames(temp))){
  ds2_data[i,] <- temp[i,]
}
rm(temp)
ds2_label <- as.numeric(as.character(Idents(ds2)))
colnames(ds2_data) <- NULL
ds2_test_data <- list(data = t(as(ds2_data,"dgCMatrix")), label = ds2_label)
ds2_test <- xgb.DMatrix(data = ds2_test_data$data,label = ds2_test_data$label)

#预测结果

predict_ds2_test <- predict(bst_model, newdata = ds2_test)

predict_prop_ds2 <- matrix(data=predict_ds2_test, nrow = bst_model[["params"]][["num_class"]], 
                           ncol = ncol(ds2), byrow = FALSE, 
                           dimnames = list(c(0:(bst_model[["params"]][["num_class"]]-1)),colnames(ds2)))

## 得到分群结果
ds2_res <- apply(predict_prop_ds2,2,func,rownames(predict_prop_ds2))
confuse_matrix1 <- table(ds2_test_data$label, ds2_res, dnn=c("true","pre"))

sankey_plot(confuse_matrix1,0:5,0:4,session = "ds0tods2")

Idents(ds2) <- factor(ds2_res,levels = c(0:5))
umapplot(ds2)

```

```{r}
embedding <- FetchData(object = ds2, vars = c("UMAP_1", "UMAP_2"))
embedding <- cbind(embedding, t(predict_prop_ds2))

ggobj <- ggplot() +
  geom_point(data = embedding[embedding$`0`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `0`), shape=16, size = 2, alpha=0.5) + 
  scale_color_gradient('0', low = "#FFFFFF00", high = "#6dc0a6") +
  new_scale("color") +
    geom_point(data = embedding[embedding$`1`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `1`),shape=16, size = 2, alpha=0.5) + 
  scale_color_gradient('1', low = "#FFFFFF00", high = "#e2b398") +
   new_scale("color") +
    geom_point(data = embedding[embedding$`2`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `2`),shape=16, size = 2, alpha=0.5) + 
  scale_color_gradient('2', low = "#FFFFFF00", high = "#e2a2ca") +
     new_scale("color") +
    geom_point(data = embedding[embedding$`3`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `3`),shape=16, size = 2, alpha=0.5) + 
  scale_color_gradient('3', low = "#FFFFFF00", high = "#d1eba8") +
     new_scale("color") +
    geom_point(data = embedding[embedding$`4`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `4`),shape=16, size = 2, alpha=0.5) + 
  scale_color_gradient('4', low = "#FFFFFF00", high = "#b1d6fb") +
     new_scale("color") +
    geom_point(data = embedding[embedding$`5`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `5`),shape=16, size = 2, alpha=0.5) + 
  scale_color_gradient('5', low = "#FFFFFF00", high = "#fd9999") +
        xlab("UMAP 1") + ylab("UMAP 2")  +
        theme(axis.line = element_line(arrow = arrow(length = unit(0.2, "cm")))) +
        scale_y_continuous(breaks = NULL) +
        scale_x_continuous(breaks = NULL) + 
  theme(panel.background = element_blank(), panel.grid = element_blank(), legend.position = "bottom")
ggsave("ds0tods2umap.svg",device = svg,plot = ggobj,height = 8,width = 8)
```

## ds0 -> ds1
```{r}
Idents(ds1) <- ds1$seurat_clusters
temp <- get_data_table(ds1, highvar = F, type = "data")
ds1_data <- matrix(data=0, nrow = length(rownames(ds0_data)), ncol = length(colnames(temp)), 
                   byrow = FALSE, dimnames = list(rownames(ds0_data),colnames(temp)))
for(i in intersect(rownames(ds1_data), rownames(temp))){
  ds1_data[i,] <- temp[i,]
}
rm(temp)
ds1_label <- as.numeric(as.character(Idents(ds1)))
colnames(ds1_data) <- NULL
ds1_test_data <- list(data = t(as(ds1_data,"dgCMatrix")), label = ds1_label)
ds1_test <- xgb.DMatrix(data = ds1_test_data$data,label = ds1_test_data$label)

#预测结果

predict_ds1_test <- predict(bst_model, newdata = ds1_test)

predict_prop_ds1 <- matrix(data=predict_ds1_test, nrow = bst_model[["params"]][["num_class"]], 
                           ncol = ncol(ds1), byrow = FALSE, 
                           dimnames = list(c(0:(bst_model[["params"]][["num_class"]]-1)),colnames(ds1)))

## 得到分群结果
ds1_res <- apply(predict_prop_ds1,2,func,rownames(predict_prop_ds1))
Idents(ds1) <- factor(ds1_res,levels = c(0:5))
umapplot(ds1)

confuse_matrix <- table(ds1_test_data$label, ds1_res, dnn=c("true","pre"))
sankey_plot(confuse_matrix,c(0:4),c(0:4),session = "ds0tods1")
```

```{r}
embedding <- FetchData(object = ds1, vars = c("UMAP_1", "UMAP_2"))
embedding <- cbind(embedding, t(predict_prop_ds1))

ggobj <- ggplot() +
  geom_point(data = embedding[embedding$`0`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `0`), shape=16, size = 2, alpha=0.5) + 
  scale_color_gradient('0', low = "#FFFFFF00", high = "#6dc0a6") +
  new_scale("color") +
    geom_point(data = embedding[embedding$`1`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `1`),shape=16, size = 2, alpha=0.5) + 
  scale_color_gradient('1', low = "#FFFFFF00", high = "#e2b398") +
   new_scale("color") +
    geom_point(data = embedding[embedding$`2`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `2`),shape=16, size = 2, alpha=0.5) + 
  scale_color_gradient('2', low = "#FFFFFF00", high = "#e2a2ca") +
     new_scale("color") +
    geom_point(data = embedding[embedding$`3`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `3`),shape=16, size = 2, alpha=0.5) + 
  scale_color_gradient('3', low = "#FFFFFF00", high = "#d1eba8") +
     new_scale("color") +
    geom_point(data = embedding[embedding$`4`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `4`),shape=16, size = 2, alpha=0.5) + 
  scale_color_gradient('4', low = "#FFFFFF00", high = "#b1d6fb") +
     new_scale("color") +
    geom_point(data = embedding[embedding$`5`>0.1,], 
             aes(x = UMAP_1, y = UMAP_2, color = `5`),shape=16, size = 2, alpha=0.5) + 
  scale_color_gradient('5', low = "#FFFFFF00", high = "#fd9999") +
        xlab("UMAP 1") + ylab("UMAP 2")  +
        theme(axis.line = element_line(arrow = arrow(length = unit(0.2, "cm")))) +
        scale_y_continuous(breaks = NULL) +
        scale_x_continuous(breaks = NULL) + 
  theme(panel.background = element_blank(), panel.grid = element_blank(), legend.position = "bottom")
ggsave("ds0tods1umap.svg",device = svg,plot = ggobj,height = 8,width = 8)
```



##lym
```{r}

```


## ARI 和聚类数的关系
Add a new chunk by clicking the *Insert Chunk* button on the toolbar or by pressing *Ctrl+Alt+I*.

When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the *Preview* button or press *Ctrl+Shift+K* to preview the HTML file).

The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike *Knit*, *Preview* does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.
